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Abstract— In this article, 180 gastric images taken with Light Microscope help are used. Maximally Stable 
Extremal Regions (MSER) features of the images for classification has been calculated. These MSER features 
have been applied Discrete Fourier Transform (DFT) method. High-dimensional of these MSER-DFT feature 
vectors is reduced to lower-dimensional with Local Tangent Space Alignment (LTSA) and Neighborhood 
Presen’ing Embedding (NPE). When size reduction process was done, properties in 5, 10, 15, 20, 25, 30, 35, 40, 
45, and 50 dimensions have been obtained. These low-dimensional data are classified by Random Forest (RF) 
classification. Thus, MSER DFT LTSA-NPE RF method for gastric histopathological images have been 
developed. Classification results obtained with these methods have been compared. According to the other 
methods, classification results for gastric histopathological images have been found to be higher. 
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I. Introduction 

Stomach cancer is a type of cancer that occurs in the 
Stomach cancer, Stomach tissue and Stomach wall. 
According to studies conducted by the Ministry of 
Health, Stomach cancer was identified as thesecond 
most common cancer type. Endoscopy is the most 
important factor in the early diagnosis of this 
disease. Endoscopic examination of the 
endometrium and biopsy specimens are taken and 
diagnosed as pathological examinations. It is seen 
that half of the people who have this diseasea relate 
in the diagnosis and doctors cannot apply any 
treatment [1,2]. The most common sites of this 
disease in the world are distant countries such as 
Japan and China. Japan, the number of people with 
Stomach cancer accounts for about 30% of other 
cancer diseases. In the Americas, the number of 
Stomach cancer people increases every year [1,3-4]. 
According to research conducted worldwide, 26% of 
malesand 11% of females have Stomach cancer. 
Stomach cancer is located in the third place after 
lung and breast cancer in women and second place 
after lung cancer in males. According to the number 
of new Stomach cancer is estimated to be around 30 
thousand a year [1,5]. S. Yoshihiro and colleagues 
[6] studied Stomach cancer by developing 
computer-based systems that can predict the risk 
factor. In the system developed, endoscopy images 
were taken from patients carrying H. pylori bacteria. 
15 parameters were used to classify the Stomach 


mucosa with 3 parameters on the back panel. The 
classification data were processed by Bayesian 
theorem and outputs were obtained. This study is the 
source for the treatment of patients who are at risk 
of Stomach cancers or patients who have to undergo 
endoscopy [1]. D.Ahmadzadeh [7] developed a 
cancer diagnosis system by using KDM (decision 
support machine) and local pattern algorithm 
methods for Stomach cancer diagnosis. By using the 
feature identification, feature extraction and noise 
reduction steps in the system they have developed, 
the results of estimation of 91,8% of 55 randomly 
selected samples were found. The common feature 
of both systems used in the study is that it is a system 
that helps the doctor in time and material sense [1]. 
Akbari et al. [8] developed a Stomach cancer 
diagnosis system by using infrared ultra-spectral 
imaging method. This study was performed by 
selecting patients with Stomach cancer. Spectral 
features were extracted from cancerous and normal 
tissues and compared with this, KDM method was 
used to determine the detection of cancerous regions 
by spectral diagram It was performed. High 
performance was obtained from the system by using 
HFD and Log transformation in the direction of the 
obtained data. In the study performed, 25 patients 
with indole cancer were detected in 30 patients and 
83,3119% of the obtained system was 
mathematically successful [1]. Stomach cancer is a 
type of cancer that, when diagnosed late, leads the 
patient to death [1,9]. Stomach cancer usually begins 
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with ulcer and gastritis complaints. Cancer can 
affect lymph nodes and other peripheral organs 
[ 1 , 10 ]. 


Methods of the study are given in Section 2. 
Experiment results are described in Section 3. 
Discussions are described in Section 4, and 
conclusions are explained in Section 5. 

The purpose of this article is to add a different 
study to the literature to help early diagnosis of 
stomach cancer using images of histopathology. The 
novelty of this study, when studies in the literature 
are examined, it is seen that MSER, DFT, LTSA, 
NPE and RF methods are not used together to help 
diagnose early gastric cancer. We have put forward 
a more powerful computer-aided method of using 
these methods together in the same moment. Also, 
the classification performances have been compared 
according to the number of selected MSER features. 


H. T H EORY AND METHOD 

A. Maximally Stable Extremal Regions (MSER) 

The MSER algorithm is an algorithm used to find 
circles or ellipse-like shapes (blobs) in images. The 
algorithm selects key points taking these shapes into 
account and calculates their attributes on these key 
points [ll].An MSER region [12] consists of a set 
of interconnected points over a certain threshold 
value, whose persistence is dependent on the 
changing threshold value. In other words, the 
selected region is a local binary form that does not 
depend on a set of threshold values. According to the 
study given in Reference [13], finding MSER 
regions works in a similar way to the Water 
shedding method. The threshold value is changed to 
vary between [0-255], while regions connected to 
one another, which do not change or change very 
little in all scenes, are called as MSER regions. 

In the implementation of the method; the 
image points are sorted by the brightness value, and 
the points are placed in the image in ascending or 
descending order. During this process, the 
component regions and fields is kept in a list by 
finded with the join detection algorithm. At each 
step of the thresholding process, the small region 
with the association of the two regions is included in 
the large region, and the smaller region is removed 
from the list. The threshold values at which the 
variation of the zone area is minimized by increasing 
/ decreasing the threshold value are selected as the 
threshold ranges producing the maximum stable 
extremal zones. In other words, the boundaries of 
the end regions formed by the connected 
components at all threshold values are represented 


as Q lt ... Qj_ i, Qi, ... a series of contiguous regions. 
This sequence provides the Q t c Q i+1 condition. In 
order to select the Q** extremal region of the array as 
the maximum stable [13]; To be local minimum is 
required in i* value of the q(i) — |Qj ± A|/|Qj| 
expression. In this expression, 1.1 denotes the area of 
the region, and Avalue denotes a parameter of the 
method. The ± sign also indicates that there are local 
minimum for both decreasing and increasing 
threshold values. This process is applied on all the 
region arrays in the set of extremal regions to obtain 
the maximum extremal stable regions [13]. 

In the MSER method, neighboring image 
points having a similar color are subjected to 
clustering based on the stacked clustering. For 
clustering process, the color distances of four or 
eight neighboring points to one another are kept at 
an associated list. At each step of the algorithm, t 6 
[0 ... T], the image points are labeled progressively. 
If the coordinate space of the image points is 
denoted as a set of labels fl = [1 ...L]x[l ...M] c 
Z 2 , each step is expressed as E t \fl -» N mapping. As 
a result of the labeling, the connected points with the 
same indicate specify the image extremal regions. 
The distance between all neighboring points of the 
image extremal region must be lower than d thr (t), 
which is a threshold value calculated for the step in 
question. The distance of the image points in the 
color space is calculated using the chi-square 
distance. Initially, all values in the Eolabel image are 
labeled as 0. In E t ta g image, all neighbor points with 
a distance less than d thr (t) are labeled as new 
region and E t+1 tag image is obtained [13]. Due to 
the spatial relationship between the image points, 
the distances of all neighboring points do not show 
a uniform distribution. The vast majority of 
distances have small values, and large distance 
values in very few number exist. Therefore, at each 
step, the threshold value is increased linearly, 
resulting in a very rapid number of tag changes at 
the beginning. Thus, to the end of the steps, the 
labels of many points change [13]. In order to 
change the label of an equal number of image points 
in each step, the distance between all neighboring 
points in the image is taken as a random variable and 
the threshold values are modified according to the 
ordered reversal of the cumulative distribution 
function (CDF) of this random variable [13]. The 
chi-square CDF for colored images is calculated as 
in Equation (1) 

C 3 O) = J^ e ~ 3x/2M + erf(f3x/2g) (1) 

where p is the mean of the sample set. As a result, 
Equation (2) is used to find the threshold values after 
the average estimate. 
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<W(t) = c- 1 g) t £ [0 ...T] (2) 

Next, field changes of the extremal regions detected 
by increasing the value of d thr {t) at each step are 
checked, and maximum stationary ones are detected. 
In addition, those smaller than a given value are 
eliminated from the maximum stable extremal 
regions [13]. 


B. Discrete Fourier Transform (DFT) 

Fourier transformation is a mathematical method 
that allows a periodic signal to be expressed by sinus 
and cosine components at different frequencies [14]. 
The Fourier transformation is defined for an array of 
infinite lengths, and more importantly, a function of 
the w angular frequency, which is a continuous 
variable. When using MATLAB, we do not limit 
arrays and we need to evaluate for a limited number 
of points. Discrete Fourier Transform (DFT) 
eliminates these problems. In this article, a discrete 
Fourier transform (DFT) of multidimensional 
MSER property values is calculated using a Fast 
Fourier Transform (FFT) algorithm. The discrete 
Fourier transform is expressed as follows 

XQc) = Zn=ox(n)W^ k 0 < k < N - 1 (3) 

- it—i 

where is W N — e if the N sequence length is 
large, the direct presence of the DFT requires a large 
amount of processing. That is, as the N number 
increases, the number of transactions made increases 
rapidly and the number of transactions goes to an 
unacceptable level. In 1965, Cooley and Tukey 
developed a procedure to reduce the amount of 
processing required for Discrete Fourier Transform 
[14]. This procedure caused a sudden increase in 
DFT applications in digital signal processing and 
other fields. It has also been a pivotal step in the 
development of other algorithms. All these 
algorithms are known as Fast Fourier Transform 
(FFT) algorithms. These algorithms have greatly 
reduced the number of operations required for the 
DFT account, thereby ensuring ease of operation. 
FFT is an efficient and economical algorithm for 
DFT computation [15]. 


C. Local Tangent Space Alignment (LTSA) 

Given A m-dimensional points sampled possibly 
with noise from an underlying d-dimensional 
manifold, this algorithm produces A dimensional 
coordinates Te R dxn for the manifold constructed 
from f local nearest neighbors [16]. 


Step 1. Extracting local information: 

For each a — 1, ■ ■ ■ ,A, Determine f nearest 
neighbors x ab of x a , b = 1, . . . ,f. Compute the d 
largest eigenvectors eigenvectors gi,...,gd of the 
correlation matrix ( X a — ~x a e T f (X a — x t e T ), and 
set 

Ga = [e/'ff,gj.,---,g.d] (4) 

Step 2. Creation of the alignment matrix: If a direct 
eigenvalue decomposition is used, create the 
alignment matrix 0 according to the local collection. 
Otherwise, apply a routine that computes the matrix- 
vector multiplication for an arbitrary vector u [16]. 

Step 3. Calculate the spherical coordinates: 
Calculate the smallest eigenvector of d + 1 and select 
the eigenvector matrix [u 2 ,- ■ • ,u d+1 ] 
corresponding to the smallest eigenvalues, and T — 
[u 2 r ■ ■ ,u d+1 ] T [16]. 


D. Neighborhood Preserving Embedding (NPE) 

Neighborhood Preserving Embedding (NPE) is a 
linear approach to the LLE algorithm [17]. The 
algorithmic procedure is formally described below: 

1. Constructing an adjacency graph : Let's show a 
graph with G, m nodes. The i-th node corresponds to 
the a; data point. There are two ways to create an 
adjacency graph. [17]: 

• K nearest neighbors: Put an edge oriented from 
node i to j if a u is among the K nearest neighbors of 

Q-i. 

• e neighborhood: u if IIa u - a*II < e and Put an edge 
between nodes i. 

The graphic generated by the first method is a 
graphic, built in the second method a non-directional 
graph. It is difficult to choose a good e in many real 
world applications. That’s it, the graph we adopt the 
knn method to create similarity. 

2. Computing theweights: In this step, the weights 
at the edges have been calculated. Let W denote the 
weight matrix with W iu having the weight of the 
edge from node i to node u. and 0 if there is no such 
edge. The weights on the edges can be calculated by 
minimizing the following objective function [17], 


min'Zi ll^i — Hu w iu a u || 

(5) 

with constraints 


£u w iu=l, u=l,2,...,m 

(6) 
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3. Computing the Projections: In this step, the 
linear projections have been calculated. Solve the 
following generalized eigenvector problem [17]: 

AKA T a = LA A r a (7) 

where 

A — {u j, ■ ■ ■, a m ) 

K = (Im - W)T (7m - W ) 

Im — diag{ (8) 

It is easy to check that K is symmetric. Let the 
column vectors x Q ,- ■ ■ ,x d _ 1 be the solutions of 
Equation (9), ordered according to their eigenvalues, 
AO < • • • < Ad — 1. Thus, the embedding is as 
follows: 

a i yt=A T a i 

X = (x OI - ■ -,x d -i) (9) 

where 7-dimensional vector is y;, and X is an n X d 
matrix. 


E. Random Forest (RF) 

RF is a collection of tree type classifiers. The Gini 
index is used in RF classifier. The division position 
according to the smallest Gini index is determined 
by Gini measurements. To generate a tree with the 
RF classifier, two external parameters must be 
entered. These parameters are m and N parameters. 
m is the number of variables used in each node. N is 
the number of trees that will be developed to 
determine the best partition [18,19]. The start m 
value is entered randomly from the outside. 
Subsequent m's are reduced or increased relative to 
the overall error rate. Classification accuracy is 
understood by generalized error data. P new training 
data is generated from R training data. Tree-type 
classifiers in the RF classifier are used in the 
{h{x, Op)P — 1 ,...} type, where, x is the input data; 
Op indicates the random vector. The h(x,R P ) 
classifier is constructed using the new training data 
set. The x andy are not found in R k . When a random 
pixel is selected for a given training data set R, this 
pixel belongs to class S;.Therefore, the Gini index is 
expressed as below. 

ZZ]*iifiS i ,R)/\R\)(f(S j ,R)/\R\) (10) 


Here, R is the training data set, S i is the class to 
which a randomly selected pixel belongs, /(Sj,R)/ 
17? |) indicates the possibility of belonging to the S t 
class of the selected example. 

In this article has been used the MSER features. 
The dimensions of these features have been reduced 
to lower dimensions with help the DFT-LTSA, and 
DFT-NPE methods by the Linear Discriminant 
Analysis (LDA) method. These lower size features 
have been classified by Random Forest (RF) 
method. The steps of this article are shown in Fig. 1. 


Features of the Histopathological Images 
Maximally Stable Extremal Regions (MSER) 

Dimensional Reduction of the Features 

Local Tangent Space Alignment (LISA) 
Neighborhood Preserving Embedding (NPE) 

Classifier 


Random Forest (RF) 


Figure 1. StepstoApply of the MSER DFTLTSA- 
NPE RF Algorithms 


III. EXPERIMENTAL RESULTS AND 
DISCUSSION 


MSER features of stomach histopathology images 
were found.The dimensions of the MSER properties 
for each image are very high and have been obtained 
differently.lt takes a long time to process with 
MSER characteristics which are different from each 
other and very high dimensions. By applying the 
DFT method to the MSER features obtained from 
this aspect, the cells obtained for the MSER features 
are converted into vectors, and the same size values 
are obtained for all the images. 
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TABLE 1. EXAMPLES OF DIFFERENT DIMENSIONAL 
CELLS FOUND FOR MSER FEATURES 


MSER cells 
for normal 
images 

MSER cells 
for benign 
images 

MSER cells 
for malign 
images 

52800x1 

98944x1 

66944x1 

34368x1 

56000x1 

70464x1 

59776x1 

41984x1 

74560x1 

60544x1 

30656x1 

81856x1 

54016x1 

38976x1 

73600x1 

46016x1 

83072x1 

65344x1 

51712x1 

80256x1 

60416x1 

52480x1 

35264x1 

54784x1 

53184x1 

24704x1 

51712x1 

48192x1 

30528x1 

55744x1 

50432x1 

41920x1 

81280x1 

39488x1 

78912x1 

124288x1 

52544x1 

43520x1 

80960x1 

75904x1 

34688x1 

112768x1 

63872x1 

53248x1 

121088x1 

58304x1 

43520x1 

94080x1 


As shown in Table 1, the number of MSER 
properties obtained for each image is different from 
each other. We have applied the DFT method to 
these obtained property values because the numbers 
of these properties are different for each image. 


When the DFT method is applied, the feature size 
obtained for normal images is 60x183680. It is 
60x197888 for benign images. 60x379008 for 
malign images. Finally, LTSA and NPE size 
reduction methods were applied to feature values of 
these images. Feature values of 5, 10, 15, 20, 25, 
30, 35, 40, 45, 50 for LTSA and NPE size reduction 
methods were discussed respectively. 
30 normal + 30 benign + 30 malign images were 
used for testing purposes while 180 images were 
used for 30 normal + 30 benign + 30 malign 
educational purposes. And the test images were 
classified with the random forest classifier. 
According to the feature numbers selected in Table 
2, the classification performances obtained with the 
RF classifier are shown. According to Table 2 The 
highest classification performance obtained with 
LTSA is 94.44%. This ratio was obtained by 
selecting 15 properties. Flowever, the highest 
classification performance obtained with NPE is 
86.66%. This ratio was obtained by selecting 10 
properties. Based on these results, the LTSA size 
reduction method over the MSERDFT property 
values showed higher classification performance 
than the NPE size reduction method. As shown in 
Table 1, both the LTSA and the NPE's classification 
performance decreases as the number of features 
increases. This is shown in the graphic in Figure 2. 


TABLE 2. CLASSIFICATION PERFORMANCES OF RF CLASSIFIER ACCORDING TO SELECTED FEATURE NUMBER 


Classification Performances (%) 

Selected Feature 
Number 

5 

10 

15 

20 

25 

30 

35 

40 

45 

50 

MSER DFT LTSA RF 

92.22 

91.11 

94.44 

87.77 

88.88 

82.22 

77.77 

68.88 

54.44 

46.66 

MSER DFT-NPE RF 

84.44 

86.66 

81.11 

76.66 

66.66 

65.55 

56.66 

50.00 

53.33 

47.77 



5 10 15 20 25 30 35 40 45 50 


m Feature Number 

u 

■ MSER_DFT-LTSA_RF MSER_DFT-NPE_RF 

Figure 2. Comparison of the Classification Perfonnances found 
with MSERDFTLTSARF and MSERDFTNPERF 
Algorithms 


In Table 3, GLCM is Gray Level Co-occurence 
Matrix Features. LBP is Local Binary Patterns 
Features. LPP is Locality Preserving Projections for 
dimensional reduction. FIOG is Histograms of 
Oriented Gradient Feature. LDA is Linear 
Discriminant Analysis. ANN is Artificial Neural 
Network. 

In Table 3, HOG LDA ANN accuracy rate 
was found as 88,9 %. LBP LPP ANN accuracy rate 
was found as 85.56 %. GLCM LPP ANN accuracy 
rate was found as 80.12%. Our Method 
MSER DFT LTSA RF accuracy rate has been 
found as 94.44%. 
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TABLE 3. COMPARISON OF DIFFERENT RESULTS 


Compared 

Method 

Accuracy 

Our Method MSER DFT LTSARF 

94.44 % 

Our Method MSERDFTNPERF 

86.66 % 

HOGJ.DA ANN [1] 

88,9 %. 

GLCMLPPANN [1] 

80.12 % 

LBP LPP ANN [20] 

85.56 % 


IV. CONCLUSION 

Until today, many classification methods have 
been used in the field of health [21-33]. 180 gastric 
images taken with Light Microscope help in this 
article have been classified. Maximally Stable 
Extremal Regions (MSER) features of the images 
for classification has been calculated. These MSER 
features have been applied Discrete Fourier 
Transform (DFT) method. Fligh-dimensional of 
these MSER-DFT feature vectors is reduced to 
lower-dimensional with Local Tangent Space 
Alignment (LTSA) and Neighborhood Preserving 
Embedding (NPE). When size reduction process 
was done, properties in 5, 10, 15, 20, 25, 30, 35, 40, 
45, and 50 dimensions have been obtained. These 
low-dimensional data are classified by Random 
Forest (RF) classification. Thus, 
MSER DFT LTSA-NPE RF method for gastric 
histopathological images have been developed. The 
highest classification performance obtained with 
LTSA is 94.44%. This ratio was obtained by 
selecting 15 properties. Flowever, the highest 
classification performance obtained with NPE is 
86.66%. Classification results obtained with these 
methods have been compared with other 
classification results in the literature. According to 
the other methods, our classification results for 
gastric histopathological images have been seen to 
be higher. In future studies, an analysis will be 
performed by applying different feature extraction 
methods to different cancer images. 
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